Comparing an HMM and an SCFG

نویسندگان

  • Arun Jagota
  • Rune B. Lyngsø
  • Christian N. S. Pedersen
چکیده

1 Baskin Center for Computer S ien e and Engineering, University of California, Santa Cruz, CA 95064, U.S.A. E-mail: {jagota,rlyngsoe} se.u s .edu 2 BRICS⋆ ⋆ ⋆, Department of Computer S ien e, University of Aarhus, Ny Munkegade, DK-8000 Århus C, Denmark. E-mail: storm bri s.dk. Abstra t Sto hasti models are ommonly used in bioinformati s, e.g. hidden Markov models for modeling sequen e families or sto hasti ontext-free grammars for modeling RNA se ondary stru ture formation. Comparing data is a ommon task in bioinformati s, and it is thus natural to onsider how to ompare sto hasti models. In this paper we present the rst study of the problem of omparing a hidden Markov model and a sto hasti ontext-free grammar. We des ribe how to ompute their o-emission or ollision probability, i.e. the probability that they independently generate the same sequen e. We also onsider the related problem of nding a run through a hidden Markov model and derivation in a grammar that generate the same sequen e and have maximal joint probability by a generalization of the CYK algorithm for parsing a sequen e by a sto hasti ontext-free grammar. We illustrate the methods by an experiment on RNA se ondary stru tures. 1 Introdu tion The basi hain-like stru ture of the key biomole ules, DNA, RNA, and proteins, allows an abstra t view of these as strings, or sequen es, over nite alphabets, obviously of nite length. Furthermore, these sequen es are not ompletely random, but exhibit various kinds of stru tures in di erent ontexts. E.g. a family of homologous proteins is likely to have similar amino a id residues in equivalent positions; an RNA sequen e will have pairs of omplementary subsequen es to form base pairing heli es. Hen e, it is natural to onsider applying models from formal language theory to model di erent lasses of biologi al sequen es. Though not ompletely random, biologi al sequen es an still possess inherent sto hasti traits, e.g. due to mutations in a family of homologous sequen es or a la k of knowledge (and omputing power) to orre tly model all aspe ts of RNA se ondary stru ture formation. Thus, it is often better to use sto hasti models giving a probability distribution over all sequen es, where a high

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic Models of Video Structure for Program Genre Detection

In this paper we introduce stochastic models that characterize the structure of typical television program genres. We show how video sequences can be represented using discrete-symbol sequences derived from shot features. We then use these sequences to build HMM and hybrid HMM-SCFG models which are used to automatically classify the sequences into genres. In contrast to previous methods for usi...

متن کامل

Evaluation of the Hidden Markov Model for Detection of P300 in EEG Signals

Introduction: Evoked potentials arisen by stimulating the brain can be utilized as a communication tool  between humans and machines. Most brain-computer interface (BCI) systems use the P300 component,  which is an evoked potential. In this paper, we evaluate the use of the hidden Markov model (HMM) for  detection of P300.  Materials and Methods: The wavelet transforms, wavelet-enhanced indepen...

متن کامل

Alert correlation and prediction using data mining and HMM

Intrusion Detection Systems (IDSs) are security tools widely used in computer networks. While they seem to be promising technologies, they pose some serious drawbacks: When utilized in large and high traffic networks, IDSs generate high volumes of low-level alerts which are hardly manageable. Accordingly, there emerged a recent track of security research, focused on alert correlation, which ext...

متن کامل

Identification of Verb-Particle Constructions in English

We propose different syntax-based methods for automatically identifying verb-particle constructions in English. The methods are based on the Deterministic Finitestate Automaton (DFA), Hidden Markov Model(HMM), and Synchronous ContextFree Grammar (SCFG). Our experiments show that the methods could result in F-score 83.3% over our manually annotated test-set consisting of Wikipedia articles and B...

متن کامل

A New Fast and Efficient HMM-Based Face Recognition System Using a 7-State HMM Along With SVD Coefficients

In this paper, a new Hidden Markov Model (HMM)-based face recognition system is proposed. As a novel point despite of five-state HMM used in pervious researches, we used 7-state HMM to cover more details. Indeed we add two new face regions, eyebrows and chin, to the model. As another novel point, we used a small number of quantized Singular Values Decomposition (SVD) coefficients as feature...

متن کامل

Identifying novel sequence variants of RNA 3D motifs

Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (S...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007